#obviously language and semantics is also important but some of you guys are picking up on the smallest things that no disabled person + | Explore Tumblr posts and blogs

masterweaverx · 5 years ago

Text

Tyrian Callows. I gave a vague, offhand description of what I thought he was in an earlier post, but now I guess it’s time to explain my reasoning. Strap in, we’re getting into the gear of this guy.

Obviously the first thing we want for this build is to incorporate the scorpion tail somehow, both in stabbing and in poison. We’re also going to want a way for Tyrian to manipulate his opponents around the battlefield for his own twisted amusement. And of course he needs to match that with the ability to pit people against each other with just a few well-placed words.

To point an Array, we’re actually opening up this build with 15 points in Wisdom. I know I do that a lot, but a lot of the RWBY characters are very perceptive and Tyrian knows just what to say to frighten anybody. 14 points in Dexterity after that, because he’s a very capable fighter, followed by 13 points in Strength for purposes of multiclassing. 12 points in Constitution after that, Tyrian’s a very durable chap, followed by 10 points in Charisma for that incredible laugh, and let’s leave his Intelligence at 8--who needs booksmarts when you have faith in the dark goddess?

Now in my earlier reblog I said Tyrian would be a wildhunt shifter, but rethinking things a bit I think the better choice would be a Simic Hybrid. Simic Hybrids get +2 to their Constitution and +1 to one other ability of their choice, and I’m going to go with Wisdom to even that out. They have 60 feet of Darkvision, and can choose two Animal Enhancements--Nimble Climber gives Tyrian a climbing speed equal to his walking speed, and Carapace gives him +1 to his AC while he’s unarmored. Technically he doesn’t get the benefits of Carapace till level five, but you know, we’re doing a total twenty-level build here so that’s a bit of a semantic complaint.

I still think Tyrian would have the Acolyte background, though, for some fairly obvious reasons, and that gives him proficiency in Insight and Religion, as well as a couple bonus languages. Do the Grimm have a language? Did I just make Tyrian even more terrifying? I dunno. He also technically gets the Shelter of the Faithful feature, so if there’s a church of Salem he can head over there for free healing. Do the Grimm have a religion? Would they worship Salem? These are very important questions we’re just going to ignore.

Tyrian’s getting 4 levels as an Assassin Rogue, 6 levels as a Drunken Master Monk, and 10 levels as a Beast Barbarian. Ordinarily I wouldn’t care about the order of how he gets these, but because of a few multiclassing oddities we need to take the Rogue class first and the Monk class before the Barbarian class. And just to get it out of the way, we get four Ability Score Improvements; I’m going to burn one to get the Resilient feat, giving Tyrian +1 Strength and proficiency in Strength saves, and spend the six points left to get Wisdom and Dexterity both up to 18. So that’s 18 Wisdom and Dexterity, 14 Strength and Constitution, 10 Charisma, and 8 Intelligence, all things told.

Rogues get proficiency with Light Armor, Simple Weapons, and a bunch of specific weapons I’m going to gloss over because multiclassing into Barbarian just gives us proficiency with Martial Weapons and Shields anyway. They also get proficiency in Thieves’ Tools and Dexterity and Intelligence saving throws, and four skill proficiencies; Acrobatics, Athletics, Intimidation, and Stealth are must-haves for Tyrian.

Tyrian’s Rogue levels give him a few noncombat benefits, such as knowing Thieves’ Cant (the secret language of criminals) and Expertise in two skills (I’d go with Intimidation and Acrobatics). Mostly, though, it just gives him a bit of an edge--he can deal 2d6 extra damage with a Sneak Attack, so long as he has advantage on the attack or is flanking the target, and his Cunning Action lets him Dash, Disengage, or Hide as a bonus action. And being an Assassin not only gives him proficiency with Disguise and Poisoner’s Kits, it also means he has advantage on attack rolls against creatures that haven’t taken a turn and that he’s guaranteed a critical hit on any target who’s surprised.

With six levels as a Monk, Tyrian has a base AC equal to 10+his Dexterity modifier+his Wisdom modifier (+1 for his racial Carapace feature) while not wearing armor, and a movement speed of 45 feet at the same time, and he can reduce falling damage by 30 HP. He also gets martial arts, letting him use his Dexterity for attack and damage rolls and rolling 1d6 for his damage rolls with unarmed or monk weapons (which handaxes count as), as well as make an unarmed strike as a bonus action when he takes the attack action. He can make an Extra Attack per every attack action, and his Ki-Empowered strikes count as magical for purposes of resistance. And speaking of Ki, Tyrian has a pool of six Ki points that regenerate on a rest, which he can spend to make a Flurry of Blows, engage in Patient Defense, take a Step of the Wind, Deflect Missles some girl is shooting at him, or make a Stunning Strike.

Of course, as a Drunken Master Monk (or more accurately a Complete Nutjob Monk), Tyrian gets a few more perks. Whenever he makes a Flurry of Blows, he can Disengage for free and gets +10 to his speed. He can Leap To His Feet from a prone position using only 5 feet of movement, and he can spend a Ki point to redirect a failed Melee attack to somebody next to him, which with his absolutely ridiculous AC is likely to happen. And of course, Tyrian gains proficiency with Brewer’s Supplies, which means he can make you a drink and poison it too! No wonder he’s Qrow’s nemesis...

Ten levels in Barbarian just makes a terrifying man that much more terrifying. Tyrian can enter a Rage as a bonus action, though maybe it’s better to call it a Fervor for him. This gives him advantage on Strength checks and Strength saving throws, +3 damage to any Strength-based melee attack, and resistance to bludgeoning, piercing, and slashing damage. A Rage can last for one minute, but it ends early if Tyrian doesn’t attack anybody by the end of his turn. Luckily he can Rage four times per long rest. He can also make a Reckless Attack, giving him advantage on Strength-based melee attack rolls for a turn at the cost of giving others advantage against him. And a Brutal Crutical lets him add one extra damage die to a critical hit.

Tyrian also gets a Danger Sense, giving him advantage on dexterity saving throws against anything he can see, and Feral Instinct, granting him advantage on initiative rolls and the ability to act while Surprised if the first action he takes is to enter a Rage. Finally, we’re swapping out Fast Movement for the alternate feature Instinctive Pounce; this means Tyrian can lunge up to a creature that ends their turn within fifteen feet of him as a reaction, without provoking attacks of oppurtunity.

Of course the big reason we’re taking so many levels in Barbarian is to get the Beast Primal Path, which will finally give Tyrian his signature tail. Granted, he only gets it when he Rages because of how the game mechanics work, but his tail deals 1d12 piercing damage on a hit and has the reach property. The Beastial Soul feature lets him swap between being a fast swimmer, powerful jumper, and ridiculous climber every rest; honestly, he’s got good jumps, so that’s probably his default. And of course there’s Infectious Fury, which forces a Wisdom Save on anybody he hits with his tail; if they fail, they either have to make a melee attack against another creature Tyrian picks, or they take 2d12 psychic damage.

All this combined makes Tyrian hard to hit and gives him loads of ways to hit hard, as well as granting him options to make people hit other people and get in and out of fights as he so pleases.

So yeah, that’s Tyrian.

#RWBY #Dungeons & Dragons #D&D 5E #Tyrian Callows #Character Build

12 notes · View notes

vocalfriespod · 7 years ago

Text

A Sirious Problem Transcript

MEGAN: Hi, welcome to the Vocal Fries Podcast, the podcast about linguistic discrimination.

CARRIE: I'm Carrie Gillon.

MEGAN: And I'm Megan Figueroa. And today we are going to be talking about artificial intelligence generally and more specifically automatic speech recognition. We have a guest here with us, because Carrie and I do not know anything about - well, ok, I shouldn't speak about Carrie’s ignorance on the topic - but I don't know

CARRIE: You were correct: I know nothing.

MEGAN: Ok. We are joined by Dr. Rachael Tatman. She is a data preparation analyst at Kaggle, which is, according to its own Twitter account, the world's largest community of data scientists. Rachael has a PhD in linguistics from the University of Washington, where she specialized in computational sociolinguistics. Her dissertation, among other very cool things, showed the ways in which automatic speech recognition falls short when dealing with sociolinguistic variation, like dialects. Welcome Rachael.

RACHAEL: Hi! Thanks for having me.

CARRIE: Hi!

MEGAN: I'm very excited to have you. I feel like, with automatic speech recognition - I don't know if other people feel this way - but I was in the camp where I didn't realize that I should care about what's happening, with how automatic speech recognition is being made or to listen to voices. I didn't know that I had to care, and now I care. Hopefully we’ll show listeners why we should care.

RACHAEL: Yeah! I can share one of my stories about automatic speech recognition. One thing that's really difficult is children's voices, because obviously children are a different size, and they have a lot of acoustic qualities that are different. But also children have a lot of individual variation. If you spend a lot of time with kids, what's a “bink bink”? Is it a blankie? Is it a bottle? I'm Dyslexic, and when I was in grade school, they tried to use automatic speech recognition to like help me type faster, so I could complete assignments and turn them in. And not fail third grade. Yeah: it did not work well. I remember very distinctly that I tried to say “the walls were dark and clammy”. We were doing a creative writing exercise, and it was transcribed as “the wells we're gathered and planning”. Which is kinda close acoustically, but also there's some probably poor language modeling behind that, where they thought that that was a more likely sentence than the one that I'd started with.

CARRIE: Wow.

MEGAN: Lets define automatic speech recognition for the listeners, and for myself. What is automatic speech recognition?

RACHAEL: It's the computational task of taking in an acoustic signal of some kind and rendering it as speech. When I say an acoustic signal, I mean specifically a speech acoustic signal, because also people work with whale song and bird song and stuff. It gets used a lot in especially mobile devices. If you know Google Now or Cortana - I don't know how many people actually use Cortana - or a Bixby, which is Samsung's virtual assistant, or Siri, which is probably the most well-known one, they all rely on automatic speech recognition to sort of understand what you're saying and reply to your tasks. It gets used a lot in virtual assistants, which is Echo or Google Play, or Apple's launching one soon, as well. I don't know how much you guys keep up with tech news, but these are little devices that sit in your home, and you can be like, “hey, Siri”. I guess, I don't know what the Apple one’s gonna be called. Or, “okay, A L E X A”. I don't want to say it, because I don’t want to turn on everybody’s Alexa.

CARRIE: Oh no! I think you just did!

ALEXA: Hmm. I'm not sure what you meant by that question.

RACHAEL: Go back to sleep Alexa. It's everywhere, is the point. People are incorporating into new technologies. They're getting really excited about it. People are talking about incorporating it into testing for schools, for standardized testing. People are talking about incorporating it into medical diagnostic tests. Things like - what's that a semantic one, where you have to name a bunch of things that are similar, before you move on?

CARRIE: I don't know.

RACHAEL: It gets used for diagnosing a lot of things, like schizophrenia and Alzheimer's and specific learning disorders. Semantic coherence test maybe?

CARRIE: Yeah.

RACHAEL: Anyway, people have been working on using speech recognition for that, so incorporating it into this. People are using it for language assessment, for immigration and visas, a lot of very high stakes places.

MEGAN: That's very high stakes! That's very important.

RACHAEL: Probably my favorite thing to be upset about in this realm is people incorporating NLP, which is natural language processing, which is more as text, and also automatic speech recognition, in these algorithms that you put information into and it tells you whether or not you should hire the person.

CARRIE: UGH. Oh my god.

RACHAEL: So very, very high stakes applications. You may not always realize that your voice or your or your language is being used in this way.

MEGAN: You can't see my face, but I'm horrified right now. Okay. It's very important. There's a lot of practical applications that automatic speech recognition is being used for. In all of these realms, there's possibility of discrimination.

RACHAEL: Yeah. As far as I know, no one who has looked at an automatic speech recognition system or a text-based system, specifically looking at performance across different demographic groups on a certain task, has ever found that “nope there's no difference, it doesn't matter, the system is able to deal super well with people of all different backgrounds”. Looking specifically at speech, I've done a number of studies, and by a number I mean two, and I'm working off and on a third, because I am also working full-time - this isn't part of my job. I'm not speaking on behalf of my company or employer. If you're gonna yell at anybody, yell at me personally. This is my private, individual thing. What I found is that there are really, really strong dialectal differences - so differences between people who have different regional origins. Which dialects get recognized more or less accurately seems to be - I'm having a hard time picking it apart, but I think it also is a function of social class. It's fairly difficult to find speech samples that are labeled for the person's dialect and also their social class, and good sociolinguistics sampling methods. It's really hard to find large annotated speech databases that you can do this analysis with, but I found really strong dialectal differences in accuracy, with general American, or mainstream American English, or mainstream US English, or standardized American English - there's a lot of different terms for this “fancy” talk - having the lowest error rate. I found that Caucasian speakers have the lowest error rate. Looking at Caucasian speakers, African American speakers, speakers of mixed race, and the study where I had race information - I only had one Native American speaker, so I had to exclude them, because one data point is not a line. So that's worrying.

MEGAN: Right. What does it mean to have an error? What is the practical result of an error in speech recognition?

RACHAEL: There are three types of errors. One is where a word is substituted, so you say “walls” and it hears “wells” and transcribes that. Another one is deletion, where you say something like “I did not kill that man” and “I did kill that man” is transcribed. I should say people are still using hand stenographers for court cases, as far as I know. I don't think anyone in the legal system is using ASR, but yikes.

CARRIE: Better not.

RACHAEL: There’s also insertion, when you think that you heard a word and it wasn't actually there. A lot of times words that’re inserted are function words like “the” and “of”, things like that.

MEGAN: So deletion, insertion, and hearing it wrong. Doing another word.

RACHAEL: Yeah those are the only three transformations you can do, yes.

MEGAN: Okay.

RACHAEL: Word error rate is just, for all the words, how many of them did you get wrong in one of these ways. Just on a frustration level, if you're using speech recognition as a day-to-day user, and it doesn't work real great, that's annoying. I'm sure if you guys ever use speech recognition, like on your phones, or I have a Google home, and I'll use it for a timer a lot. It's actually gotten better - it used to be really bad at hearing the word “stop” like “stop the timer”. I think that might be because of the [ɑ] [ɔ] merger that some people have. That's my pet theory. But it's gotten a lot better at understanding “stop”. I would have to say “stop” five times while I'm standing at the kitchen with cheese smeared on my arms up to my elbows or whatever.

CARRIE: That's really strange because there isn't a different “stop”. I have the [ɑ] [ɔ] merger, so I can't make the other word, but it doesn't exist anyway.

RACHAEL: Yeah, it may be that the acoustic model is more - so speech recognition, I'm gonna say this generally - because people are futzing around with it a lot and I'm messing it up -generally has two modules. One is the acoustic model, which is “what waveforms map to what sounds” and the other is the language model, which is “what words are more likely”. When you when you put those together, out comes the other end through some fancy math the most likely, for some given set of input parameters, the most likely transcription, ideally. And my guess is that if you're not specifically modeling the fact that some people have two vowels and some people have one vowel in that space, you may be less able to recognize those sounds generally, because you think that there's just a lot of variation there. Especially since there's also the Northern city shift that's muddling that whole area as well. Sorry, should I assume a lot of phonetic backgrounds on the part of your speakers?

CARRIE: Our listeners? Yeah, I was just gonna say: maybe we should describe what the Northern vowel shift is.

RACHAEL: There are a number of vowel shifts in the United States, and if you think of individual vowels as being little swarms of bees that are clustered around flowers, sometimes the swarms of bees move on or the flower moves and the swarm follows after it, and different places have movement in different directions. I don't know, is that a good analogy? I'm using my hands a lot. I know you guys can't see it. Is that clear?

CARRIE: I understand what you're saying but I'm not sure. Good question.

MEGAN: I don’t know. I like the analogy. I feel like that's good.

RACHAEL: I would look up vowel change shifts, if I was listening to this. I’d just google them, and you'll see some nice pictures and arrows. You’ll be like “oh!”

CARRIE: Yeah. We’ll add something to the Tumblr to explain a little bit about vowel shifts, and also the merger we were talking about, because I can't replicate it. I can't do that open o [ɔ].

MEGAN: I can’t either. I don’t have it.

RACHAEL: “cot” [k ɔ t] as in “I caught the ball” and then “caught” [kɑt] - nope, I have it backwards again.

CARRIE: Yep. We haven't asked you yet, but what is computational sociolinguistics?

RACHAEL: I don't think I made up the term, but I'm probably one of the first people to call myself that. Dong Nguyen - she's currently at the Alan Turing Institute - has a fabulous dissertation that has a really nice review chapter that talks about the history of this emerging field. It is approaching sociolinguistic questions using computational methods, and it's also informing computational linguistics and natural image processing and automatic speech recognition with sociolinguistic knowledge. Working on dialect adaptation, I think would fall within that - that's when you take an automatic speech recognition system that works on one dialect and try to make it work good for other dialects as well. I've done some work on modeling variation in textual features by social groups. I've looked at political affiliation and punctuation and capitalization in tweets, and there's pretty robust differences at least in the US between oppositional political identities. I'm trying to think of other people's work, so it's not just: here's a bunch of stuff that I've done!

MEGAN: Basically, everyone's trying to model everything.

RACHAEL: Basically. Or should be, hopefully. I think, historically, there hasn't been a lot of - I think sociolinguists are much better about knowing what's going on in computational linguistics then computational linguists are at knowing about what's going on in sociolinguistics. I'm coming from sociolinguistics and coming to computational linguistics. I'm trying to have a big bag of Labov papers and toss them to people, be like “here you go! Here you go!”

MEGAN: Yes and Labov is a very famous sociolinguist.

RACHAEL: He is, yes. I would call him the founder of variationist sociolinguistics - which is not the only school, but it is the school that I work in mainly.

CARRIE: Yeah, I think that's - well that's the most famous one as far as I know.

MEGAN: Yeah. I didn't know there were other ones. Of course there is.

RACHAEL: Yeah, I'm trying to think of names. Mostly I'll come across it I'll be like “oh”. I guess discourse analysis is a type of sociolinguistics.

MEGAN: Oh, okay.

CARRIE: Yes.

RACHAEL: But different bent.

CARRIE: How is automatic speech recognition trained to understand humans? I think you've already started to answer this, but maybe you can answer it'll be even more, if there is more to say.

RACHAEL: Yeah. I mentioned there are two components: there's the acoustic model and then there the language model. Usually the language model is actually trained on texts. You take a very, very, very large corpus. I think right now - I don't know about the standard, but what I think most people would like to use would be the Google trillion word corpus, which is from scraped web text, or people use the Wall Street Journal corpus, which is several hundred million words long. You know the probability of a certain set of words occurring in a certain order, so it's the poor man's way of getting syntax. I'll tell you about how it's traditionally done. People are replacing both the pronunciation dictionary and the acoustic model, which sometimes includes the pronunciation dictionary with big neural nets. We can talk about that in a little bit, but traditionally the pronunciation dictionary was made by hand. The Carnegie Mellon the pronunciation dictionary, or CMU pronunciation dictionary, is probably the best-known one for American English. People transcribe words, and if there's one that you need that's not transcribed, you add it.

MEGAN: And what’s a pronunciation dictionary?

RACHAEL: It is a list of words and then how they're pronounced. The phones, so “cat” would be [k] [æ] [t] - those three sounds in order. Then the acoustic model takes the waveform and tells you the probability of each of those sounds. So it's like “well I'm pretty sure it's [æ], but I guess it could also be [ɑ]”, through a process of transformations. People recently have been taking a speech corpus - usually one that's labeled, so you know what words are spoken - and then using all of that data and shoving it into a neural net, which is a type of machine learning algorithm - it's a family of machine learning algorithms. People use different types and flavors, and they have different structures. What neural nets are really, really good at is finding patterns in the data, and recognizing those same patterns later, without you having to tell them to do it. They learn it themselves, from just the way that the information is organized. They've been really, really good and useful in image processing, in particular, being able to look at a photo and be like “here is an apple”, “here is an orange” and “I have circled them helpfully for you”. They're really good at that. But as it turns out there is more structure in language than there is in other types of data.

CARRIE: Shocking. [sarcasm]

RACHAEL: It is to some people. I've had a lot of frustrating conversations where people were like “but it works really good on images!” I'm like “yes, but language is different”. If it weren't, we wouldn't need linguistics. People wouldn't need to study language their entire lives, if it was just like images but in sound, basically. Which I think is probably not news to any listeners of this podcast, but definitely it is news to some people. Neural nets are really good at seeing things that they've seen before, or identifying the types of things they've seen before, and if they see new things, they're not so good at it. I think that's really where a lot of the trouble with dialect comes in, because sociolinguistic variation is very systematic between dialect regions. One person can have multiple dialects as well. I don't want to make it sound like you sort people into their dialects and then apply the correct model and then boom everything's correct all the time. Because people have tried that and it works better than not doing anything, but it's still not - I don’t know. There's a lot of work to do, and I don't want to make it sound like speech research engineers are just fluffing around and not knowing about language, because they do. But it's difficult, and it hasn't, I think, been a major focus for a lot of people recently, and I'm hoping that it will become more of a research focus.

MEGAN: You said something in one of your interviews that I wanted to read here that I liked. You say that “generally the people who are doing the training aren't the people whose voices are in the dataset. You'll take a dataset that's out there that has a lot of different people's voices, and it will work well for a large variety of people. I think the people who don't have sociolinguistic knowledge haven't thought about the demographic of people speaking would have an effect. I don't think it's maliciousness. I just think it wasn't considered.”

RACHAEL: Yeah.

MEGAN: I think “it was a considered” part - it's how I felt actually. I obviously very much care that people aren't discriminated against in every aspect of life. But I just didn't think about speech recognition.

RACHAEL: Yeah. I think we have this idea that like “oh a computer’s doing it, so it's not gonna be biased”.

MEGAN: You’re right.

RACHAEL: That’s nice to believe that you have the ethical computer from Star Trek, but bias is built into all machine learning models. It's one of the things you study in a machine learning class. You talk about bias and variance, and it's there in the model, and it's there in the data. Pretending that it can go away if you just keep adding more data is a little bit of a problem for the people who are actually using the system, and it doesn't work as well for them as it should, maybe.

CARRIE: It's also very naïve.

MEGAN: Yeah. Humans are the ones that are doing it, right. We’re behind the machines. Of course there's biases. I was thinking, I've said I've never thought about this before, but I don't use Siri, because Siri does not understand me very well at all. I've given up.

SIRI: I miss you Megan.

MEGAN: I didn't take the next step. I didn't take the next step, and think “oh why is this the case that she's not understanding me very well”.

RACHAEL: Yeah.

CARRIE: She understands me pretty well. I have a pretty standard North American accent.

RACHAEL: A little bit of the Canadian shift.

CARRIE: I do, but it's not enough to trick SIRI, apparently. My accent has shifted somewhat since living in the States for over nine years. I knew that speech recognition did have a problem with at least some dialects, because there's a fairly famous skit from Burnistoun, the Scottish sketch comedy show, where he's just saying “eleven”, and it's one of the words where in a Scottish accent “eleven” is pretty close, so the speech recognition should have been able to pick it up. Most of the sketches is them speaking in a Scottish dialect that I think many Americans would not understand actually.

IAIN CONNELL: You ever tried voice recognition technology?

ROBERT FLORENCE: No.

IAIN CONNELL: They don't do Scottish accents.

ROBERT FLORENCE: Eleven.

ELEVATOR: Could you please repeat that.

ROBERT FLORENCE: Eleven.

IAIN CONNELL: Eleven.

ROBERT FLORENCE: Eleven. Eleven.

IAIN CONNELL: Eleven.

ELEVATOR: Could you please repeat that.

IAIN CONNELL: Eleven. If you don't understand the lingo, away back home your own country. [If you don't underston the lingo, away back hame yer ain country.]

ROBERT FLORENCE: Oohh, is the talk now is it? “Away back home your own country?” [Oh, s'tha talk nae is it? "Away back tae yer ain country"?]

IAIN CONNELL: Oh, don't start Mr Bleeding Heart – how can you be racist to a lift? [how can ye be racist tae a lift?]

ELEVATOR: Please speak slowly and clearly.

CARRIE: Anyway, it's a really funny sketch, if you haven't seen it. I will post it, because I think it's funny.

MEGAN: I don't know what it is about me. I don't know if vocal fry would affect it at all. I'm also kind of mumbly. I try not to be mumbly on the podcast obviously, but in my normal everyday life, I am a mumbler, so that might be it. I expect Siri to understand my mumbles, but she don't, so I gave up.

RACHAEL: But see, that's part of the problem, because - I don't know for sure, but I would be beyond shocked if - because I know that for sure, Google has the ability to - it retains the speech samples that you send them, and I'm sure that they fold them back into their training data, so if you're not using it, because it doesn't understand you, it's pretty much never gonna understand you, is the unfortunate thing. I think that's really part of the reason that there's - I think - pretty strong class effects. This is this is me having a science hunch that I haven't really banged out yet in some experimental work. I think that people who have a higher socioeconomic status and particularly professional class, mobile - not rural the other one.

CARRIE: Urban.

RACHAEL: Urban! Yeah, thank you. Especially professional, mobile, urban people have - I'm almost positive - higher cognition rates, correct word rates.

MEGAN: You mentioned something about how the language model was taking in things like The Wall Street Journal. Wouldn't that affect it too? That's not your acoustic signal, but it's the way you speak? I don't know.

RACHAEL: Yeah. No that's fair. “‘Fiduciary’ seems to be a fairly common word that humans use all the time, so I’m gonna look for that one.”

CARRIE: I would be very surprised if class didn't play a role. It always does. In everything that we talk about, there's something about class going on too. But we don't think about it as much in North America as we should.

MEGAN: We really don't. Especially since it's wrapped in with race and ethnicity so much. I act like I know anything beyond the States. It's just very American.

RACHAEL: I think it's very much the top-level thing that people think about with language variation in the UK, for sure.

MEGAN: Ah, okay.

CARRIE: Yeah. Absolutely.

MEGAN: Interesting.

RACHAEL: There's RP, and then those weird regional dialects that we don't like. As a person not from the UK, that's the judgments that I've gotten from consuming popular media.

CARRIE: It used to be worse. Because the BBC used to only have received pronunciation with their reporters, but now you'll hear regional varieties. Still the most prestigious versions of those varieties, but at least you'll hear Irish dialects now. Things are slightly better.

MEGAN: You'd hope so. ASR is trained to understand humans, so you're feeding in them these datasets, and I didn't know this but I guess, like you said, if I talk to Siri, I'm also feeding into a dataset.

RACHAEL: Yeah. That seems very likely to me. Again, I don't know for sure, and this may be something that's Googlable, you could find using a search engine, and it may be something that you could not find using a search engine. The other thing about neural nets is because they're good at seeing things they seen before, they get really good if you have a lot of data, a lot of data. I have not yet seen the company that would ignore free data that people were giving to it to improve model performance.

MEGAN: Do you have examples of automatic speech recognition failing to understand people that we can give the listeners, so they can see the problem?

RACHAEL: I can give you one from my life, which continues to drive me nuts. I'm from the South and I have a general American professional voice that I use, but especially if I'm relaxing with friends or with my family, I definitely sound more Southern. One of the things that happens in the South and also in African American English is nasal place assimilation. If you have a nasal after a stop, which are sounds like [k] [t] [p] [g] [d] [b], you will change the nasal, [m], [n], or [ŋ], to whatever the thing in front of it was. I would say “beanbag” as “beambag”, especially in an informal setting. Or a “handbag” is “hambag”. Put your things in your handbag. I think it's a fairly common thing. Google used to always, always, always search for “beambag” when I wanted to know about “beanbags”, because I was doing research to get - I currently have one, I just turned to look at it - a really good beanbag chair. They’re very comfy! I like them. It kept telling me about “beambags”, which are not a thing! It just drove me up the wall, because lots of people do this thing. This is a normal speech process.

CARRIE: Yeah. Very common.

MEGAN: Also, a “hambag”, a bag of hams and that might be something people have.

RACHAEL: I guess a Smithfield ham does come in a bag. It comes like a little canvas bag.

MEGAN: I guess that's where it's trying to get you. But that's not what you’re [meaning]. That’s funny. Okay, how do we solve this problem? What should we be thinking about when we develop automatic speech recognition databases and such? Who should be involved?

RACHAEL: Sociolinguists. Definitely hire sociolinguists. That's my general go-to drum. It's a hard problem. I don't want to pretend that a sociolinguist looks at it and they're like “ah! Fix this parameter!” and then suddenly it works great for everyone. Because the fact of minority languages or language varieties, in particular, is that they’re minority because fewer people use them. If you are trying to optimize performance and accuracy for the model as a whole, and you raise it for the people who are from minority groups - whatever those may be - if you are using the one model, that will lower it for your majority language speakers. Just adding more data isn't necessarily going to be the fix. People have been have been working on this for a long time, and it's a very hard problem, and I have nothing but respect for everyone who's working on this. There's a couple of approaches that people are doing. One is to train multiple models on different stable language varieties. In the US I might train one on West Coast generally, and as far as that is a single language variety, I’d probably train one on the Northeast, one on the northern cities, so Chicago, Michigan sort of area - Chicago's in Illinois - Illinois, Michigan sort of area. One on the South. One also for the mid-Atlantic region. And then select one of those models, based on whichever would most accurately represent the person who's speaking. That's one approach. Another approach is to take the model and then change it for every single person's voice. That will capture dialectal variation, but it will also capture individual variation. The reason that your phone doesn't do that automatically is because it is very computationally intensive. These models are very big. They have a lot of information in them. They have a lot of parameters, and to change those, it takes a lot of raw processing power. That's not really feasible to do for individual people, as it stands. I don’t know, maybe in five years it will be completely feasible. We’ll all have GPUs falling out of our pockets everywhere we go. I don't know. That's another approach that some people have taken. I don't know, maybe with some fancy new ensembling - which is where you multiple different types of models and stick them together like - what are those, K’nex? - and they build a pipeline, and then you shove the data all the way through the pipeline, and all the different models that are connected together. Those have been getting really good results lately, so maybe some sort of clever ensembling, where you do something like demographic recognition, and then something like shifting your language model a little bit. I don't know. I don't know. I don't know what people are gonna come up with.

MEGAN: This is the future. This is the future that millennials want or something. I don’t know. This is the future liberals want. If this is the future, I'm thinking about the fact that in 30 years we're gonna be a majority-minority country. We're on our way to this becoming a bigger and bigger problem.

RACHAEL: Yes. Definitely.

MEGAN: The fact that Siri or Alexa - sorry - has trouble understanding people that aren't in this white -

RACHAEL: Super-privileged, small group?

MEGAN: Yeah, right. There's a gender bias too, right? It's males that are understood.

CARRIE: And we're the majority.

RACHAEL: I just want to quickly intercede here - I did some in earlier work finds that it was more accurate - specifically YouTube's automatic captions were more accurate for men than women, but I think, because I couldn't replicate that result, the problem there was actually signal-to-noise ratio. Women tend to be a little bit quieter, because we're a little bit smaller. If you are speaking at the same effort-level in the same environment, there's just gonna be a little bit more noise in the signal for women, because we're not quite as loud. I don't know that clutter signal processing can fix that. I'm gonna keep working on this, and who I might find out that actually there are you know really strong differences, it maybe it can't deal with things that women do more. I was gonna say “vocal fry”, but I've seen no evidence that women fry more than men, which I'm sure you talked about. At length.

CARRIE: Right. That was our first episode. Everybody does it. Leave us alone!

MEGAN: Leave it alone. Get the fuck off my vocal fry! What I'm hearing is this is something that we should all very much care about, because, like Carrie said, everyone else is the majority. If it's best trained on white men that are in higher socio-economic classes, that's not the majority. It sounds like we need to have people in the room, because, like you said, you don't think it was considered when they were making these datasets. We need people in the room that are like “wait, I come from this community where that's not how we talk, this is not gonna work for me or us”.

RACHAEL: Yeah, definitely.

MEGAN: I definitely want to plug a representation too. We need more people in the room.

RACHAEL: Definitely. I've been talking about English, because that's what I know about, and specifically American English. I don't want to get into British dialectology, cuz that's crazy, crazy complex. But this is also a problem in other languages. Arabic dialects are incredibly different from each other.

CARRIE: Right.

MEGAN: Now I'm thinking about people that are bilingual.

RACHAEL: Or bidialectal.

MEGAN: Or bidialectal, for sure. That's gonna be something else that we would want automatic speech recognition to recognize.

RACHAEL: Yeah. Absolutely. I can give people something that you can do right now - is that Mozilla, which is the company that owned the Firefox - continues to own, I think, the Firefox web browser - is currently crowdsourcing a database of voices, and voice samples. You can head over to that website, for which there is a link that I for sure can't find. I think it's called the Mozilla Common Voice Project, but don't quote me on that unless it's right.

MEGAN: We'll put it somewhere.

RACHAEL: Mozilla is doing a collection of voices of people, and they're specifically trying to get people from different demographic backgrounds, for specifically this problem, for knowing demographic information about someone, for having speech samples for the. They're also having people manually check the recording, so if this is something that's interesting, and you want to listen to a lot of voices, I'd recommend heading over there and checking it out.

MEGAN: Ah, so they are crowdsourcing automatic speech recognition. That's a good idea. That’s a tough - how you get the most variation in the people that reply.

RACHAEL: One thing that I found in my own work, and other computational linguists have found as well, is that we know a lot about variation in speech, but a lot of the same variation also exists in text. A lot of the text that you produce in your day-to-day life, especially if it's anywhere online, is getting fed into a lot of natural language processing tools. There are also problems with those. Things like identifying what language someone is using is not as good.

CARRIE: Yeah, I notice that on Twitter a lot. It wants to translate from French all the time.

MEGAN: Yeah.

RACHAEL: Twitter's language ID is a hot mess. A hot mess.

CARRIE: And it's never French. It's never French. In fact, sometimes it's English. I'm like “what is going on?”

MEGAN: I've had Estonian. Translate from Estonian.

RACHAEL: Yeah. Estonian tends to show up a lot. I'm trying to think of - I have started doing some very lackadaisical data collection. I think it seems to work on a character level, so it tends to be fairly good at languages that have a unique character set. It tends to be very good at Thai, but related Germanic languages - pfft - it does not. That's Bing. That's on Microsoft. They're the back end there, so I 100% blame them. Maybe, if they hadn't gutted their research teams, they would be able to do this better.

CARRIE: Hint hint.

MEGAN: That is something that we can do immediately. Do you have something really poignant you want to say about why this is all important? What's the takeaway message? Because we've been talking us this whole time about why it's important, but what do you think is the takeaway?

RACHAEL: It's important to hear people's voices. Both literally and metaphorically.

CARRIE: There we go. There's the money shot.

MEGAN: That’s the money shot. Money, money, money. See that's what we wanted!

CARRIE: Yes. It's important to hear people's voices. I think that's a good place to end.

MEGAN: Yeah, cuz that was it. Unless you have anything else, Rachael?

RACHAEL: Hmm. No, I don't think so. I use my hands a lot, so hopefully a lot of the things that I was saying with my hands I was also saying with my voice.

MEGAN: Yeah, I realized that at our first episode, I was using my hands, and now my hands don't even move. It comes with some experience - of my four episodes that I have done, five episodes.

CARRIE: Five! Five episodes. This is our sixth.

RACHAEL: Ooh! Lucky number 6!

CARRIE: Thank you so much, Rachael, for talking with us today.

RACHAEL: You’re welcome!

CARRIE: That was awesome. I learned a lot.

MEGAN: I know, I learned so much. I was so ignorant on this subject. So thank you. Hopefully this will be of interest to people that have no idea, but also to our listeners that really like speech recognition stuff. I know that I know that they're there. This is very exciting. Alright, cool. I guess we want to leave everyone with one message, which is: don't be a fucking asshole.

CARRIE: Don't be an asshole. Bye!

CARRIE: The Vocal Fries Podcast is produced by Chris Ayers for Halftone Audio. Theme music by Nick Granum. You can find us on Tumblr, Twitter, Facebook and Instagram @vocalfriespod. You can email us at [email protected].

#transcript #AI #ASR #accents #linguistic discrimination #automatic speech recognition #artificial intelligence #siri #alexa

4 notes · View notes

douchebagbrainwaves · 8 years ago

Text

YOU GUYS I JUST THOUGHT OF THIS

But you can't eat paper. Don't try to make a new web-based email program, they'll get their asses kicked by a team of qualified experts and tell them about it, they'll be able to say what the most important factor in the growth of mature economies—that is who Jessica Livingston is. Startup founders are naturally optimistic. Less fortunate startups just end up hiring armies of people to sit around having meetings. Slashdot readers now take it for granted that a story about a patent will be about a bogus patent. But it's lame to clutter up the semantics of the language, which could in principle be written in terms of these fundamental operators. A team that outplays its opponents but loses because of a bad decision by the referee could be called unlucky, but not as many more as could. You just try it. Actually, it's more often don't worry about this; worry about that instead. Maybe I'm excessively attached to conciseness. Unfortunately this is just a metaphor, and not a useful one to most people outside the US. This essay is derived from a talk at the 2006 Startup School.

So while they're often nice guys, they just can't help it. So much for hockey as the game is played now. So probably the limiting factor on the number of startups is not the established players, but other startups you don't know who needs to know it. There was a friend they wanted to hire with the investor money, and by that point the innovation that generated it has already happened. And yet this startup is obviously going to succeed: their traffic and revenue graphs look like a jet taking off. It was like watching a car you're chasing turn down a street that you know has no outlet. An optimism shield has to be non-obvious. Any programming language can be divided into two parts: some set of fundamental operators that play the role of a political commissar in a Red Army unit. If your startup is doing a deal, just assume it's not going to apply for patents just because everyone else does is not like saying I'm not going to happen in the future. Instead he'll spend most of his time talking about the noble effort made by the people who run them. The problem with software patents is an instance of a more general rule: make users happy.

They'd seem very impressive. That's the part that really demands determination. At least, that's how we'd describe it in present-day terms. And present union leaders probably would rise to the occasion if necessary. We know that everyone will drive flying cars, that zoning laws will be relaxed to allow buildings hundreds of stories tall, that it will take a conscious effort to overcome it. For some reason this seems to be able to explain in one or two sentences exactly what it does. They were assigned to Viaweb, and became Yahoo's when they bought us.

The real question is, how far up the ladder of abstraction will parallelism go? But Jessica knew her example as a successful female founder would encourage more women to start companies, so last year she did something YC had never done before and hired a PR firm to get her some interviews. And because Internet startups have become so cheap to run, the threshold of profitability can be trivially low. It's also the best route to that holy grail, reusability. You need this for everyone: investors, acquirers, partners, reporters, potential employees, and even though I've studied the subject for years, and we soon dropped the pretense. Startup acquisitions are usually a build-vs-buy decision for the acquirer. The vast majority of people who want to work that hard. The problem with VC funds is that they're the same. And since one person can only manage so many deals, each deal has to be non-obvious. How would the government decide who's a startup investor?

I'd only applied for three. The workers of the early twentieth century must have had a moral courage that's lacking today. Patent trolls, it seems safe to predict they will be very much faster. Object-oriented programming is popular in big companies, because it becomes a filter for selecting bad startups. To make sure, they were willing to take it if offered—partly because there was a great deal of profanity. Her nickname within YC was the Social Radar at interviews wasn't just how we behaved when we built the product. And there is no way I can think of that before? I think it's because some things about startups are kind of counterintuitive. As technologies improve, each generation can do things that the previous generation would have considered wasteful. So why do you need a separate data type?

The one thing he'll never do is stand still. And this will, like asking for specific implementations of data structures, be something that compiler writers think about, but which is usually invisible in the source code of applications? What they'll say is that 99. Languages evolve slowly because they're not really technologies. He even has a sense, when this happens, of wasting something precious. The shielding of a reactor is not uniform; the reactor would be useless if it were. Saying pleased to meet you, whether you are or not. As with gangs, we have some idea what secrecy would be worse than patents, just that we couldn't discard patents for free. You have to be. The third reason patents don't seem to mind a minimal version 1, and yet with the right optimization advice to the compiler, would also yield very fast code when necessary. This is just a guess. For example, I doubt many people at Yahoo or Google for that matter realized how much better web mail could be till Paul Buchheit showed them.

I had to pick the startups. The CEO of Forgent, one of the biggest startup hubs in the world. So, since I'm optimistic, I'm going to number these points, and maybe with future startups I'll be able to say: number four! Be like a running back. We can see this happening already. The puffed-up companies that went public during the Bubble didn't do it just because they were too slow to release stuff, and none because they were too quick. I talk to the founders of the companies we've funded, they all say the same thing. There's a third reason big companies should prefer buying to building: that if they built their own, and they overestimate their abilities. The VCs who say they're going to buy, after all? And the culture she defined was one of YC's most important innovations. A couple months ago I got an email from a recruiter asking if I was interested in being a technologist in residence at a new venture capital fund.

An optimism shield has to be pierced too. To make them stick around you'd have to give them enough that they're not tempted by an offer from Silicon Valley VCs that requires them to move. How do you recognize good founders? The people who really care will find what they want by themselves. The scary thing is, this is the right answer. Semantically, strings are more or less our life. This would encourage what is already the worst trait of big companies filing patent suits against smaller ones, it's usually very faint at first. A company that sues competitors for patent infringement. You have to be secretive internally. Many a founder would be happy to sell his company for $15 million, but VCs who've just invested at a pre-money valuation of $8 million won't hear of that. The economy of medieval Europe was divided up into little cells is terribly inefficient. Their smartest move at that point would have been to take every penny of the $20 million and use it to buy us.

0 notes